The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening

نویسندگان

  • Rafał Kurczab
  • Andrzej J Bojarski
چکیده

The machine learning-based virtual screening of molecular databases is a commonly used approach to identify hits. However, many aspects associated with training predictive models can influence the final performance and, consequently, the number of hits found. Thus, we performed a systematic study of the simultaneous influence of the proportion of negatives to positives in the testing set, the size of screening databases and the type of molecular representations on the effectiveness of classification. The results obtained for eight protein targets, five machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest), two types of molecular fingerprints (MACCS and CDK FP) and eight screening databases with different numbers of molecules confirmed our previous findings that increases in the ratio of negative to positive training instances greatly influenced most of the investigated parameters of the ML methods in simulated virtual screening experiments. However, the performance of screening was shown to also be highly dependent on the molecular library dimension. Generally, with the increasing size of the screened database, the optimal training ratio also increased, and this ratio can be rationalized using the proposed cost-effectiveness threshold approach. To increase the performance of machine learning-based virtual screening, the training set should be constructed in a way that considers the size of the screening database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The influence of negative training set size on machine learning-based virtual screening

BACKGROUND The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. RESULTS The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. A...

متن کامل

Novel Small Molecules against Two Binding Sites of Wnt2 Protein as potential Drug Candidates for Colorectal Cancer: A Structure Based Virtual Screening Approach

Wnts are the major ligands responsible for activating Wnt signaling pathway through binding to Frizzled proteins (Fzd) as the receptors. Among these ligands, Wnt2 plays the main role in the tumorigenesis of several human cancers especially colorectal cancer (CRC). Therefore, it can be considered as a potential drug target.The aim of this study was to identify potential drug candidates ...

متن کامل

Novel Small Molecules against Two Binding Sites of Wnt2 Protein as potential Drug Candidates for Colorectal Cancer: A Structure Based Virtual Screening Approach

Wnts are the major ligands responsible for activating Wnt signaling pathway through binding to Frizzled proteins (Fzd) as the receptors. Among these ligands, Wnt2 plays the main role in the tumorigenesis of several human cancers especially colorectal cancer (CRC). Therefore, it can be considered as a potential drug target.The aim of this study was to identify potential drug candidates ...

متن کامل

Pharmacophore Based Virtual Screening Approach to Identify Selective PDE4B Inhibitors

Phosphodiesterase 4 (PDE4) has been established as a promising target in asthma andchronic obstructive pulmonary disease. PDE4B subtype selective inhibitors are known toreduce the dose limiting adverse effect associated with non-selective PDE4B inhibitors. Thismakes the development of PDE4B subtype selective inhibitors a desirable research goal. Toachieve this goal, ligand based pharmacophore m...

متن کامل

Pharmacophore Based Virtual Screening Approach to Identify Selective PDE4B Inhibitors

Phosphodiesterase 4 (PDE4) has been established as a promising target in asthma andchronic obstructive pulmonary disease. PDE4B subtype selective inhibitors are known toreduce the dose limiting adverse effect associated with non-selective PDE4B inhibitors. Thismakes the development of PDE4B subtype selective inhibitors a desirable research goal. Toachieve this goal, ligand based pharmacophore m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2017